Morfessor FlatCat: An HMM-Based Method for Unsupervised and Semi-Supervised Learning of Morphology
نویسندگان
چکیده
Morfessor is a family of methods for learning morphological segmentations of words based on unannotated data. We introduce a new variant of Morfessor, FlatCat, that applies a hidden Markov model structure. It builds on previous work on Morfessor, sharing model components with the popular Morfessor Baseline and Categories-MAP variants. Our experiments show that while unsupervised FlatCat does not reach the accuracy of Categories-MAP, with semisupervised learning it provides state-of-the-art results in the Morpho Challenge 2010 tasks for English, Finnish, and Turkish.
منابع مشابه
Semi-Supervised Learning of Concatenative Morphology
We consider morphology learning in a semi-supervised setting, where a small set of linguistic gold standard analyses is available. We extend Morfessor Baseline, which is a method for unsupervised morphological segmentation, to this task. We show that known linguistic segmentations can be exploited by adding them into the data likelihood function and optimizing separate weights for unlabeled and...
متن کاملSemi-supervised extensions to Morfessor Baseline
We have extended Morfessor Baseline, which is a well-known method for unsupervised morphological segmentation, to semi-supervised learning. As submission to Morpho Challenge 2010, we provide results from three methods: The first one is based on the unsupervised algorithm, but includes a weight parameter that can be used to control the amount of segmentation. The second one applies the semisuper...
متن کاملLearning Constructions of Natural Language: Statistical Models and Evaluations
Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Sami Virpioja Name of the doctoral dissertation Learning Constructions of Natural Language: Statistical Models and Evaluations Publisher School of Science Unit Department of Information and Computer Science Series Aalto University publication series DOCTORAL DISSERTATIONS 158/2012 Field of research Computer and Information Sci...
متن کاملLow-Resource Active Learning of Morphological Segmentation
Many Uralic languages have a rich morphological structure, but lack morphological analysis tools needed for efficient language processing. While creating a high-quality morphological analyzer requires a significant amount of expert labor, data-driven approaches may provide sufficient quality for many applications. We study how to create a statistical model for morphological segmentation with a ...
متن کاملAdvances in Weakly Supervised Learning of Morphology
Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Oskar Kohonen Name of the doctoral dissertation Advances in Weakly Supervised Learning of Morphology Publisher School of Science Unit Department of Computer Science Series Aalto University publication series DOCTORAL DISSERTATIONS 91/2015 Field of research Language Technology Manuscript submitted 19 January 2014 Date of the de...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014